Welcome to Python, Jupyter Notebook, and Python in Forensics!
All three of these topics will be covered in this notebook. If you have experience with any, feel free to jump to the next section. If you feel comfortable in all three, feel free to select a different notebook, such as timelines.
There are 2 types in this document: Text Blocks and Code Blocks
The text blocks are prompts that describe the material. I've added blurbs to (hopefully) improve comprehension of the material.
Code blocks allow us to execute Python in a sequential order. They can run individually, though can also reference eachother. When in doubt, or if an error occurs, try running all of the code blocks in order from top to bottom to see if it resolves the error. The number to the left of the code blocks (will look like [27] for example) represents the step at which the block was run. (Re-)running a block a block will increment it from the previous step number.
Code blocks can also be edited and, in some cases in these notebooks you will have to do some editing. After running the code block, the output (if any) will display below the segment, along with any errors. If you cannot fix it, feel free to open a ticket or to grab a fresh copy from github.
To run a code block, use the play button on the top bar or press CTRL-ENTER (Windows) or CMD-ENTER (Mac) on your keyboard
I find the biggest difference between python scripting and python scripting for forensics to be the use of 3rd party libraries and file handling. As opposed to other uses of Python, we are generally concerned with file metadata more than content; though when we are interested in content, we are interested in the structure and metadata about the content as well. With this, the DFIR community has released a lot of artifact parsing libraries to make our lives easer; we will work through many of these libraries throughout the notebooks.
I cannot do nearly as good of a job as Learn Python the Hard Way, so I reccomend using their site to walk you through the basics. If you are looking to use Python for Forensic purposes, I would reccomend the first 39 examples, should take around an hour or two to work through. The remaining examples are important, though not as applicable in early scripting. If you're like me - take an hour or two, work through those first 39, go do something else for a bit, run through them again, then take your hand at building your own code.
THE BELOW IS IN DEVELOPMENT - Please consider following these links to learn more about the basics!!
The below is a quick introduction to the basic Python datatypes, logic, and general syntax.
This is intended for someone who has some programming/scripting background and just wants a refresher on how to use Python to accomplish their goals. If you are brand new, please checkout the LPTHW link above and work through those examples.
Before getting started, please run the below code block! Ensure that the box on the left has at least a [1] in the corner before moving on!
In [ ]:
from __future__ import print_function
As a quick aside - we will want to use Python3 where ever possible as eventually Python2 will be depreciated. That said, I have included notes below where there are significant deviations in how items are used between the two versions. The variations are not limited to v2 vs v3, and I will identify (where it makes sense) when there is a notable difference in behavior between v3.5 and 3.6 (for example)
As a second note - Python is an Object Oriented Language, and therefore almost all items in Python are considered objects. For now, when I use the term object I am just talking abstractly about the data type, value, or structure in Python. Objects will make more sense eventually, though for now we don't have to dive too deeply into them.
In [ ]:
# To start, this is a comment, a comment is anything that follows a `#` symbol
# Comments are great for documentation and excluding lines of code from running
# Anything I type here will not be executed
""" In addition, this can be used as a comment and must start and end with: """
''' This also can be used as a comment. Comments using this method (or the one above)
can expand over several lines this can be handy in commenting out code
that you have written and dont want to run anymore (or for the moment) that spans multiple lines.
It is easier than adding a pound character in front of each line.
''';
Python has several operator types, the most common shown below:
+ - addition- - subtraction* - multiplication/ - division% - modulo (yeilds the remainder from a division operation)** - squaring() - logical groupings (stuff you want to group goes inside)== - evaluate values to check if they are equal!= - evaluate values to check if they are not equal>, < - check if value is greater or less than>=, <= - check if value is greater than or equal to or less than or equal to= - assign values (to variables); - line ending, can be used to seperate statements on a single line though is bad practice. Used in the notebooks to prevent unecessary printing of valuesNone - not exactly an operator, but stores a value of none and is a separate data type than the ones described below.Python has a variety of data types available to assist us in storing, displaying, and manipulating values.
Go checkout https://docs.python.org/3/tutorial/datastructures.html for official documentation on the below
Strings are the data type used by Python to store information as text. It can vary in length and character sets (more on character sets later). Strings are commonly defined by " or ' and these two types of quotes can be interchanged. We can also cast (or change one data type to another) using str() as shown below.
As shown in the Comments section above, we can use triple-quotes to define line-wrapped strings. If we don't assign a string to a variable, the interpreter still evaluates the contents, though effectively does nothing with it. It is essentially the same as typing a number into your script; it is syntatically correct though has no effect on the outcome of your code (and is generally bad practice).
In [ ]:
str() # a way to express a string
'A string is any text (12345) that is surrounded with quotes'
"Notice that a string is started and completed with quotes"
"We can have #comments in strings!"
"""This is also a string, and also the reason for the * earlier, since
code wrapped in the triple quotes turns into a string and is not executed, therefore acts as a comment
"""
'''
dont forget about this string type too
''';
There are two main data types for storing numbers: int() and float(). The int() data type stores integers and cannot handle any value with a fractional number. The float() data type can take both integers and real numbers, and therefore is commonly used when we have a fractional value to store.
Homework: Why are int() datatypes more commonly found than float()?
In [ ]:
# An integer is any number, expressed without quotes
int() # A way to express an integer
123
5654654
112000
1
# A float is a real number, also expressed without quotes
float() # a way to express a float
1231.24321
87654.241354
0.41654;
Let's take a moment to experiment with the datatypes we have used so far and also introduce the print() statement. This allows us to display information within our script to the standard output. In the case of Notebooks, this shows up in your browser, though if you are running from the command line, this will display in your terminal.
In [ ]:
print(123*345) # Integer math
print(25.3-45.2) # Float math
print('string1' + ' is before '+ "string2") # String Math
print("hello " * 3) # String math
So lets take a quick moment to talk about integers and floats again. If you are running Python2 and run 3/2 your result will be 1 though if you are running Python3 your result will be 1.5. So which is right?
Well (in a way) both are. In Python2 you are taking an integer (3) and dividing it by an integer (2) so the result is also an integer (1), as the data type cannot hold fractional numbers. In Python3, support was added to detect fractional results from integer division (called true division) and return the approximated value as a float() data type.
This is helpful, as it prevents errors where we intended to include a fractional number, though also causes other errors when we really wanted to have integer division. To clarify this across versions we can use these steps:
float(), use the floor division operator //3/float(2)sorry that wasn't as brief as I thought though wanted to raise that to help prevent future errors your in code
In [ ]:
print(3/2) # in Python3 this will be 1.5, in Python2 it will be 1
print(3//2) # in either version, this will be 1
print(3/float(2)) # in either version with this be 1.5
...moving on from numbers... to more numbers (sort of)!
Booleans are a way of expressing True and False, or more simly 1 and 0. Some may recognize this as being closely related to binary or operators to show on and off.
In Python we can use any of the below to express boolean operations. To probably state the obvious, 1 == True and 0 == False. By default the bool() object is set to False.
In [ ]:
bool() # A way to express a bool
True
False
1
0;
Lists, tuples and sets are a way to store a collection of values in a manner that we can retrieve them by location in the collection or with the use of iteration. These three objects hold data in an ordered manner and we can insert records at a specific location in the collection.
Lists
Lists are defined with square brackets and can contain just about any Python object. This is one of the most commonly used collection data type. We can update the contents of the list without needing to redefine the object.
Tuples
Tuples are defined with parenthesis and can also contain just about any Python object. This data type is quick to define though is immutable. This means we cannot update a record within a tuple. Instead we must re-define the tuple to replace an object in it.
Sets
Sets are defined with curly braces and can contain a unique collection of immutable objects. This includes the basic data types and tuples (so long as the tuple child objects are also immutable). We cannot store lists within a set. Sets are very helpful in keeping a distinct collection of values though have limitations causing the list data type to be more commonly used.
In [ ]:
# Lists are a series of values comma seperated and surrounded by square brackets
list() # a way to express a list
['value1', 2, 'value3', 4.0]
# Tuples are a series of values comma seperated and surrounded by parenthesis
tuple() # a way to express a list
('value1', 2, 'value3', 4.0)
# Sets are a series of distinct values comma seperated and surrounded by curly braces
set() # a way to express a list
{'value1', 2, 'value3', 4.0};
We can also store data as a key-value pair. Think of this as a collection, as described above, except we recall values by name instead of location. This is useful for storing a wide range of data, as we can give our values meaningful names that we can later use to recall our values, as opposed to having to recall the location.
A dictionary is defined with curly braces, similarly to a set, though requires a key-value pair (delimited with a semicolon) for each element. Also, unlike a set, a dictionary is mutable and can hold a wide range of data types, including lists, sets, and other dictionaries!
The one thing to note is that key names in a dictionary are always distinct. This means we will overwrite prior values if we try to re-use a key name. Python dictionaries are not ordered, so we cannot rely on the order that we define keys (prior to Python3.6. Even though Python3.6 may order dictionaries, we should not rely on this implementation.) If we want to have an ordered dictionary, we can use the collections.OrderedDict module (we will use that later.)
In [ ]:
# if it is still confusing, see http://learnpythonthehardway.org/book/ex39.html
dict() # a way to express a dictionary
{'key1': 'value1', 'key2': 234, 234: 4.0};
Now we can take these data types, defined above, and assign them to variables. Variables allow us to store and recall values using a more human friendly interface. For example, if the script needed to store a persons first name, we may want to define a variable called first_name and we may assign a string value to it, such as first_name = "chapin", using the = character.
In Python, we can use any alphanumeric character, plus the underscore, to define our variable. A few rules to follow:
file, in, from, max, min, or other terms that are otherwise used in the Python syntax
In [ ]:
# Sample variable names
ants123 = 'bugs'
people = 6
_apples = 'red'
reallylongvariablename = 324.2342
thiSHASmiXedCaSe = '123123123123' # Not easy to read - right?
ThisIsEasierToRead = 234
soIsThis = True
_123 = '2376dfgxcvsd'
# It is a good idea to use underscores or camelCase to name variables
fileName = '1.txt'
file_path = '/home/'
When in doubt, use the guidelines shown in the below codeblock
In [ ]:
# Some guidelines on Python development
import this
In case the above doesnt show (if you run it more than once, it'll disappear). You can always type it into your interpreter as well.
The Zen of Python
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
In [ ]:
# to see if a value is the same as another...
print(True == True) # Check if two values are equal
string_value = 'Value 1'
print(string_value is None)
print(isinstance('is this an int?', int)) # check if a value is a certain data type
# Or opposite
print('value 1' != 'value 2')
print(True is not False)
print(2 > 0) # greater than
print(2 >=1) #greater than or equal to
print(4 < 6) # less than
print(4 <= 6) # less than or equal to
# We can combine these statements with `and` or `or` statements
print(True == True and 123 < 234)
print(True != False or 534 >= 234)
In [ ]:
##
# Special Characters
##
# If you wish to use a single or double quote within the value of a string
# you must escape it using an `\`
example = """in example, i cannot type \"\"\" without the `\\` character in front of it
otherwise it will end the string like this"""
print(example)
One common operation of a script is looping over a set of data. This may be a collection of files or a set of timezones, and both can be stepped through with loops.
The more commmon loop is known as a for loop. A for loop generally has 5 components:
forinIn the below, we have a list of numbers that we want to iterate over to print out. In each iteration, we add one to the value available through the number variable, and print it to the console.
Notice that all logic within the loop is indented (preferably with 4 spaces). Any unindented code will be run after the loop completes.
In [ ]:
list_of_numbers = [1, 2, 3, 4, 5, 6, 7, 8]
for number in list_of_numbers:
# here is where we can do something with the number
print(number+1)
While loops are another type of iterator that allows us to continue to iterate as long as the logic evaluates to True. For example, we could scan a collection of files looking for one that ends in .pdf. Using a while loop we could continue to step through a directory until we see a file with a pdf extension and stop at that point in time.
A while loop is simplier in formation, starting with while and followed by the logial statement, and capped with a colon character. The below loop is effectively the same as the above for loop.
As with the for loop, we need to indent (with 4 spaces preferably) any logic we want within this loop statement.
In [ ]:
counter = 0
size_of_list = 8
while counter < size_of_list:
print(list_of_numbers[counter]+1)
counter += 1 # Add 1 to the counter
In [ ]:
##
# Functions
##
# This is cool stuff
"""
Ok, so you are writing code and realize that you will want to use the same
bit over and over - right? we use a function for this.
"""
def double_the_number(number):
"""
This function is specified by the `def` followed by the name of the function
followed by `()` that contain any values that the function needs to run.
Code inside the function needs to be nested with a tab otherwise it wont
run inside of it (plus it looks nice)
"""
new_number = number * 2
#The `return` statement returns the value
return new_number
# To call a function, we call it by name, and pass the value
print(double_the_number(5))
# if we assign it to a variable, then we can use the returned data
double = double_the_number(67)
print(double)